This workshop is meant to provide an overview of tools available in R for the analysis of geolocated data. This type of data is becoming increasingly common in various fields (e.g. aerial photos, satellite imagery, census data, geolocated posts on social networks, etc.). There are two broad categories of geospatial data: raster data represent variables defined at every point of a grid covering the full extent of the data (as in satellite images), whereas vector data associate variables to discrete objects located in space (such as the position of cities and highways on a road map).

Initially, using programming commands to process geographical data can seem less intuitive, compared with graphical user interface of specialized software, such as ArcGIS. Here are a few advantages of scripted analyses.

Objectives

Contents


Explore a vector dataset

The mrc folder contains coordinates for the Québec regional county municipalities (abbreviated MRC in French) in a ESRI shapefile format (data source). Note that the information for each dataset is spread across multiple files, which share a name but differ in their file extension.

To load a dataset in R, we will call the st_read function (all sf package functions start with the prefix st_, standing for spatio-temporal) with the .shp file name.

library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3

mrc <- st_read("mrc/mrc_polygone.shp", stringsAsFactors = FALSE)
## Reading layer `mrc_polygone' from data source `C:\Users\marchanp\Desktop\atelier_rgeo\mrc\mrc_polygone.shp' using driver `ESRI Shapefile'
## Simple feature collection with 137 features and 12 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -79.7625 ymin: 44.99136 xmax: -56.93495 ymax: 62.58217
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs

The output text indicates the main properties of the loaded dataset, including the geometric type (POLYGON), the spatial extent of the data (bbox) and the coordinate reference systme (CRS) in use. That CRS is described in two formats: a EPSG code and a proj4string character string. We will discuss coordinate systems in the next section.

These properties can be accessed separately using the st_bbox and st_crs functions:

st_bbox(mrc)
##      xmin      ymin      xmax      ymax 
## -79.76250  44.99136 -56.93495  62.58217
st_crs(mrc)
## Coordinate Reference System:
##   EPSG: 4269 
##   proj4string: "+proj=longlat +datum=NAD83 +no_defs"

Let’s look at the first few rows of the data:

class(mrc)
## [1] "sf"         "data.frame"
head(mrc)
## Simple feature collection with 6 features and 12 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -79.7625 ymin: 48.92349 xmax: -63.31941 ymax: 62.58217
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs
##           AREA  PERIMETER MRC_S_ MRC_S_ID     MRS_NO_IND
## 1 7.828380e+01 77.0855365      2        1 50 02 0040 000
## 2 4.743960e-02  1.6709096      3        2 50 02 0040 000
## 3 4.502033e+01 55.2806743      4        3 50 02 0040 000
## 4 7.530857e-04  0.1519356      5        4 50 02 0010 000
## 5 1.024437e+01 23.4016618      6        5 50 02 0010 000
## 6 5.028808e-01  8.8274338      7        6 50 02 0010 000
##                                     MRS_DE_IND MRS_CO_MRC
## 1 Municipalité régionale de comté géographique        992
## 2 Municipalité régionale de comté géographique        993
## 3 Municipalité régionale de comté géographique        991
## 4   MRC (incluant les territoires autochtones)        972
## 5   MRC (incluant les territoires autochtones)        972
## 6   MRC (incluant les territoires autochtones)        972
##                         MRS_NM_MRC MRS_CO_REG     MRS_NM_REG MRS_CO_REF
## 1 Administration régionale Kativik         10 Nord-du-Québec     BDGA1M
## 2         Nouveau toponyme à venir         10 Nord-du-Québec     BDGA1M
## 3                          Jamésie         10 Nord-du-Québec     BDGA1M
## 4                      Caniapiscau         09      Côte-Nord     BDGA1M
## 5                      Caniapiscau         09      Côte-Nord     BDGA1M
## 6                      Caniapiscau         09      Côte-Nord     BDGA1M
##   MRS_CO_VER                       geometry
## 1   V2017-08 POLYGON ((-77.82293 55.2603...
## 2   V2017-08 POLYGON ((-77.82293 55.2603...
## 3   V2017-08 POLYGON ((-78.49988 55.0008...
## 4   V2017-08 POLYGON ((-66.6878 55.00005...
## 5   V2017-08 POLYGON ((-70.02987 55.0000...
## 6   V2017-08 POLYGON ((-66.25978 55.0000...

A sf object is in fact a specialized data.frame, where each line contains data associated with a geometric object. The latter is described in the geometry column. THe most common geographical types are:

The plot function applied to a sf object creates a map for each field in the dataset.

plot(mrc)
## Warning: plotting the first 10 out of 12 attributes; use max.plot = 12 to
## plot all

To plot a single variable, you need to select the corresponding column. To show a map without data variables, you can select the geometry column. The axes = TRUE parameter indicates to the program that coordinate axes much be shown.

plot(mrc[, "geometry"], axes = TRUE)

Exercise 1

Produce a map with MRCs colored by administrative region, e.g.: Nord-du-Québec, Côte-Nord, etc.

Solution


The st_as_sf function can convert a data.frame containing geographical coordinates into a sf object. You need to specify which columns correspond to the coordinates and which coordinate reference système is used.

Here, let’s create a dataset containing a single point at the position of UQAT, with the same coordinate system as mrc (longitude and latitude).

uqat <- data.frame(nom = "UQAT", long = -79.0086, lat = 48.2306)
uqat <- st_as_sf(uqat, coords = c("long", "lat"), crs = st_crs(mrc))

The sf package includes various functions to compare geometric data. To determine which polygon in mrc contains the uqat point, we use the st_within function.

st_within(uqat, mrc)
## although coordinates are longitude/latitude, st_within assumes that they are planar
## Sparse geometry binary predicate list of length 1, where the predicate was `within'
##  1: 56

The warning reminds us that geometric operations in that package are based on a xy plane rather than spherical coordinates (more details on this below).

The result of a spatial comparison between geolocated data is a list which, for each object in the first dataset, contains the indices of objects in the second dataset to which the comparison applies. Here, the first (and only) point in uqat is within polygon 56 of mrc. We can check which MRC it is:

mrc[56, ]
## Simple feature collection with 1 feature and 12 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -79.51763 ymin: 47.70263 xmax: -78.22 ymax: 48.57511
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs
##         AREA PERIMETER MRC_S_ MRC_S_ID     MRS_NO_IND
## 56 0.7832265  4.624354     57       58 50 02 0040 000
##                                      MRS_DE_IND MRS_CO_MRC    MRS_NM_MRC
## 56 Municipalité régionale de comté géographique         86 Rouyn-Noranda
##    MRS_CO_REG            MRS_NM_REG MRS_CO_REF MRS_CO_VER
## 56         08 Abitibi-Témiscamingue     BDGA1M   V2017-08
##                          geometry
## 56 POLYGON ((-79.51749 48.4306...

To show multiple datasets on the same map, we first call plot with the argument reset = FALSE, then we add another dataset with the argument add = TRUE.

plot(mrc[, "geometry"], reset = FALSE, axes = TRUE)
plot(uqat, add = TRUE, pch = 20, col = "red")

Review

  • Spatial vector data associate data fields to localized geometric objects such as points, lines and polygons. The sf package can be used to work with those datasets in R.
  • To read a vector dataset: st_read.
  • To convert a regular data.frame into a spatial object: st_as_sf.
  • All basic data.frame operations also apply to sf objects.
  • The function plot applid to a sf objects displays one or many data fields on a map.
  • To check inclusion of one geometry within another: st_whithin.

Coordinate reference systems and transformations

Until now, we worked with data using a geographic coordinate system, where positions are expressed in longitude and latitude degrees. Those coordinates are based on a model that approximates the irregular surface of the Earth’s mean sea level (the geoid) as an ellipsoid (a slightly flattened sphere). There are several such models, referred to as datum. The mrc coordinates were defined based on the NAD83 system, whereas many world maps are based on the WGS84 system.

As mentioned earlier, the sf package functions are based on plane geometry. Those functions, whether they calculate the intersection between two lines or polygons, the distance between points, etc. ignore the curvature of the Earth and will produce erroneous results if applied to spherical coordinates. Note that the geosphere package provides functions to calculate distances and geodesic curves in accordance with the ellipsoid model.

A projection converts geographical coordinates in planar coordinates. Since it is impossible to exactly represent the full Earth’s surface on a plane, specialized projections were developed for different regions of the world and different analytical applications.

For example, the images below show how identical circular areas appear at different points of the Earth under a Mercator projection (which conserves shape) and a Lambert equal-area projection (which conserves area).

Mercator projection Lambert equal-area projection

We will convert the mrc polygons and the uqat point into a Lambert conical conformal projection centered on Quebec (EPSG:6622), using st_transform.

mrc_proj <- st_transform(mrc, 6622)
uqat_proj <- st_transform(uqat, 6622)

uqat_proj
## Simple feature collection with 1 feature and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -774896.9 ymin: 527230.8 xmax: -774896.9 ymax: 527230.8
## epsg (SRID):    6622
## proj4string:    +proj=lcc +lat_1=60 +lat_2=46 +lat_0=44 +lon_0=-68.5 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
##    nom                   geometry
## 1 UQAT POINT (-774896.9 527230.8)

It is important to always use st_transform to convert datasets between different coordinate systems. A common error consists in modifying the coordinate system of a dataset (for example, with st_crs) without transforming the data themselves.

Note that the projected coordinates are expressed in meters (“+units=m” in the proj4string). These coordinates are relative to a point of origin defined by the projection.

plot(mrc_proj[, "geometry"], axes = TRUE)

To create a map with latitude and longitude lines superposed on projected data, we can specify a geographical coordinate system (here, based on the original data) to the graticule argument:

plot(mrc_proj[, "geometry"], axes = TRUE, reset = FALSE, 
     graticule = st_crs(mrc))
plot(uqat_proj, add = TRUE, pch = 20, col = "red")

Review

  • Geographic coordinates systems are based on a datum (model of the Earth’s shape) and describe the position in terms of spherical coordinates (longitude, latitude) measured in degrees.
  • Projected coordinate systems converts spherical coordinates in planar coordinates (x, y) measured in meters.
  • st_crs returns the coordinate system of a sf object; st_transform converts the data from one coordinate system to another.

Geometric operations on vector data

Now that our data have been projected, let’s use the st_buffer function to define a 100km radius around the uqat point, which we will add to the map:

rayon <- st_buffer(uqat_proj, dist = 100000)

plot(mrc_proj[, "geometry"], axes = TRUE, reset = FALSE, 
     graticule = st_crs(mrc))
plot(uqat_proj, add = TRUE, pch = 20, col = "red")
plot(rayon[, "geometry"], add = TRUE, border = "blue")

Question: How many sides are there in the rayon polygon? (Look at the geometry column.) This is the default value, but there is an optional parameter in st_buffer that modifies the resolution of the buffer’s circular segments.

Exercise 2

The st_intersection(A, B) function returns the intersection of two vector datasets A and B, that is, the areas where objects in each dataset overlap. Using this function, determine how many MRCs are located within 100 km of the point uqat, and try to show on a map representing this circular area.

Solution


Pour each polygon in the intersection of the two datasets, st_intersection copies the data fields from the two original polygons. The warning (“attribute variables are assumed to be spatially constant”) reminds us that those variables might not match the new geometries.

For example, the AREA column in mrc refers to the area of the full MRC – also, it is expressed in square degrees, which is not a useful metric. For projected coordinates, the st_area function returns the polygon area in square meters.

Among the other geometric operations supported by sf, we will mention st_union, which combines multiple geometric objects in one geometry, as well as st_difference, which removes from one dataset the areas covered by a second dataset.

poly_union <- st_union(mrc_proj)
plot(poly_union)

poly_diff <- st_difference(mrc_proj, rayon)
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
plot(poly_diff[, "geometry"])

Review

  • All geometric operations in the sf package are based on planar geometry. They treat longitude and latitude as if they were perpendicular axes (x, y).
  • st_buffer creates a buffer zone at a certain distance around a given object.
  • st_distance calculates the distance between two objects; st_area calculates the area of an object.
  • st_union combines all geometric objects of a vector dataset.
  • st_intersection(A, B) produces a dataset containing all regions where objects in A and B overlap.
  • st_difference(A, B) removes from A the areas also covered by B.

Exercise 3

Forest tent caterpillar in the MRC of Rouyn-Noranda

The livree folder contains data on the defoliation due to the forest tent caterpillar (Malacosoma disstria) in Québec between 2014 and 2017 (data source from Données Québec).

livree <- st_read("livree/Livree_2014_2017.shp")
## Reading layer `Livree_2014_2017' from data source `C:\Users\marchanp\Desktop\atelier_rgeo\livree\Livree_2014_2017.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2944 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -79.57113 ymin: 45.00835 xmax: -71.60867 ymax: 48.74596
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs

For each defoliation polygon, the data indicates the year, the area in hectares (SupHaCea), along with the intensity of defoliation represented as a text variable (Niveau) and a numerical index (Ia): Léger (light, 1), Modéré (moderate, 2) or Grave (severe, 3).

We will first extract the 2017 data only. Since a sf object is a type of data.frame, we can use the normal R methods to filter the data based on one variable.

livree2017 <- livree[livree$ANNEE == 2017, ]

For this exercise, you must:

  • Determine the total area (in hectares) defoliated by the forest tent caterpillar in the MRC of Rouyn-Noranda.
  • Produce a map of the defoliated areas with their intensity level for that MRC.

Suggested steps:

  • Extract the Rouyn-Noranda MRC from the mrc dataset.
  • Extract data from livree2017 located within the MRC. Before applying spatial comparison operators, make sure the datasets have comparable coordinates. It is fine to work in geographic coordinates (longitude and latitude) for this case.

Solution

Question: What are the potential problems with this method (see the warnings issued by R)?

  • If defoliation polygons were near the MRC boundary, the intersection in geographical coordinates would not be exact.
  • If polygons overlapped the boundary, the area indicated by SupHaCea would include the part of the polygon outside the MRC.

Here, no polygon approaches the boundary, thus our method is valid. However, if we intersected the defoliation data with all the MRC boundaries, many polygons would be counted twice:

livree_mrc <- st_intersection(livree2017, mrc)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
sum(livree2017$SupHaCea) # Sum of defoliated areas in original data
## [1] 305654.6
sum(livree_mrc$SupHaCea) # Sum after intersection
## [1] 383485.2

Returning to the Rouyn-Noranda data, we could use the aggregate function to calculate the defoliated area by intensity level:

aggregate(SupHaCea ~ Niveau, data = livree_rn, sum)
##   Niveau SupHaCea
## 1  Grave 18377.87
## 2  Léger 16187.58
## 3 Modéré 18228.49

Compatibility with other packages

If you have used the data processing package dplyr, note that its functions are compatible with sf spatial datasets. Thus, the last example (using aggregate) could be reproduced with the group_by and summarize functions in dplyr.

The next version of the ggplot2 graphics package will also include functions to easily create thematic maps from sf data.

See this page for more examples integrating sf with those packages.


Thematic maps with tmap

Until now, we visualized spatial data with the plot function. To make more detailed maps, such as for a publication, the tmap package can be very useful. To create a map using tmap, we first define the geometry to display with tm_shape, then we add layers representing different variables. For example, the tm_fill function colors polygons according to their values for a specific variable.

library(tmap)

tm_shape(livree_mrc) +
    tm_fill("Niveau") 

The use of the + symbol to add elements to the map is inspired by the ggplot2 graphical package. Like ggplot2, tmap can also divide the graph into facets based on a variable in the dataset. Here, we use the administrative region indicated in the MRS_NM_REG column.

tm_shape(livree_mrc) +
    tm_fill("Niveau") +
    tm_facets("MRS_NM_REG")

Finally, here is how we can add the MRC boundaries (with tm_polygons) and names (with tm_text) to the map. Note that the elements are drawn in the order specified in the script.

tm_shape(mrc) +
    tm_polygons() +
    tm_text("MRS_NM_MRC") +
tm_shape(livree_mrc) +
    tm_fill("Niveau") +
    tm_facets("MRS_NM_REG")

Review

  • The tmap package allows the creation of maps combining different spatial variables from one or more vector datasets.
  • The first function to call to make a map is tm_shape, specifying the dataset to map.
  • Layers can be added to the map by associating graphical properties (e.g. tm_fill, tm_text) with specific data fields.
  • tm_facets divides the map in multiple images based on the value of a variable.

Raster data example: the Canadian Digital Elevation Model

The cdem folder contains regional data from the Canadian Digital Elevation Model or CDEM.

The CDEM is a raster dataset; the area of Canada is covered by a regular grid and the model associates an elevation value (in meters) to each pixel on that grid. This type of data is analogous to a digital image, to which metadata is added (resolution, spatial extent and coordinate system) so that each pixel can be associated to geographical coordinates.

The base resolution of the CDEM is 1/4800th of a degree and data are available in sections of 2 degrees of longitude by 1 degree of latitude. We will first read the CDEM index, a file indicating the rectangular areas matching each CDEM section. That index is also in the cdem folder. It is in .kml format, which is a vector data format used notably by Google Earth.

cdem_index <- st_read("cdem/cdem_index_250k.kml", stringsAsFactors = FALSE)
## Reading layer `cdem_shp_index' from data source `C:\Users\marchanp\Desktop\atelier_rgeo\cdem\cdem_index_250k.kml' using driver `KML'
## Simple feature collection with 969 features and 2 fields
## geometry type:  GEOMETRYCOLLECTION
## dimension:      XYZ
## bbox:           xmin: -142 ymin: 41 xmax: -52 ymax: 83
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

Question: Which spatial function would you use to determine which sections of the CDEM cover the MRC of Rouyn-Noranda?

rn <- mrc[mrc$MRS_NM_MRC == "Rouyn-Noranda", ]
cdem_index <- st_transform(cdem_index, st_crs(rn))
cdem_rn <- st_intersection(cdem_index, rn)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
cdem_rn$Name
## [1] "031M" "032D"

The MRC overlaps sections 31M and 32D. Those two sections can be found in the cdem folder in GeoTiff (.tif) format. To read and process raster data in R, we will use the raster package.


Working with raster data

We first load one of the two CDEM sections, with the raster function. This function associates the dataset to a raster object, and its metadata can be shown by typing its name on the command line.

library(raster)
## Loading required package: sp
cdem32D <- raster("cdem/cdem_dem_032D.tif")
cdem32D
## class       : RasterLayer 
## dimensions  : 4801, 9601, 46094401  (nrow, ncol, ncell)
## resolution  : 0.0002083333, 0.0002083333  (x, y)
## extent      : -80.0001, -77.9999, 47.9999, 49.0001  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs 
## data source : C:\Users\marchanp\Desktop\atelier_rgeo\cdem\cdem_dem_032D.tif 
## names       : cdem_dem_032D 
## values      : -32768, 32767  (min, max)

These properties include raster dimensions (number of rows and columns), its resolution (in the same units as the coordinate system, here in degrees), its extent (similar to bbox for vector data) and its coordinate reference system (in proj4string format). We can extract these properties separately with the functions dim, res, extent and crs, respectively.

dim(cdem32D)
## [1] 4801 9601    1
extent(cdem32D)
## class       : Extent 
## xmin        : -80.0001 
## xmax        : -77.9999 
## ymin        : 47.9999 
## ymax        : 49.0001

Note: While the proj4string from the CDEM differs from that of the MRC data, NAD83 and GRS80 are based on the same ellipsoid model, so the coordinates are equivalent.

Since raster datasets are generally large (e.g. 88 Mb for each CDEM section), raster doesn’t load all the data in memory, but instead creates a link to the file on disk. Visualization functions likewise only show a stratified sample of the dataset. For example, the histogram of cdem32D values is constructed from 100 000 pixels out of ~46 millions.

hist(cdem32D)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 0% of the raster cells were used. 100000 values used.

This is also the case plot, which shows an image of the raster. As previously, we can overlap a vector dataset with the add = TRUE parameter (reset = FALSE is not needed here).

plot(cdem32D)
plot(mrc[, "geometry"], add = TRUE)

We will now load the second CDEM section, then combine both datasets in a single raster with the merge function.

cdem31M <- raster("cdem/cdem_dem_031M.tif")
cdem <- merge(cdem32D, cdem31M)

We can see that the extents of the two sections overlap by one row of pixels. For those pixels, merge uses the data from the first object (cdem32D), unless they are null (NA).

Depending on your system, it is possible that the new cdem object will be held in memory (350 Mb). Above a certain size, functions in raster save the output in a temporary file to save RAM; this size threshold can be changed with rasterOptions().

To crop a raster’s extent along the boundaries of another spatial object (raster or sf), we use the crop function. Here, we extract a rectangular area around the boundaries of the Rouyn-Noranda MRC.

rn <- mrc[mrc$MRS_NM_MRC == "Rouyn-Noranda", ]
cdem <- crop(cdem, rn)

plot(cdem)
plot(rn[, "geometry"], add = TRUE)

Since raster objects are fundamentally matrices, we can apply the same mathematical operators as regular matrices in R. For example, if X is a matrix in R, X + 5 returns a new matrix where each element in X is incremented by 5, and X-Y returns a matrix where each element in Y is subtracted from the corresponding element in X.

Exercise 4

From cdem, produce (a) a map where elevation is represented in km rather than meters and (b) a map indicating areas of elevation under 300 m.

Solution


To exclude regions with elevations of 300 meters or more, but keep elevation information for other areas, we can use the mask function.

plot(mask(cdem, cdem < 300, maskvalue = FALSE))

The maskvalue indicates that pixels where the masking condition (cdem < 300) is FALSE must be excluded.

Due to time or memory limitations, it is sometimes necessary to reduce the resolution of raster data. The aggregate function reduces the number of pixels by a given factor in each direction, by combining values according to a summary function. For example, this command combine 50x50 pixel regions of the original layer by taking the mean value.

cdem_agg <- aggregate(cdem, fact = 50, fun = mean)
plot(cdem_agg)

Review

  • A raster dataset associates a value to each pixel in a regular grid. The raster package allows us to process this type of data in R.
  • To read raster data: raster.
  • Along with a matrix of values, raster objects have several attributes including an extent (extent), a resolution (res) and a coordinate system (crs).
  • Arithmetic (+, -, etc.) and comparison operators (<, ==, etc.) are applied to each pixel of a raster.
  • mask removes data from a raster for areas meeting a certain condition.
  • crop cuts the raster to a reduced spatial extent.
  • merge pastes together different raster objects.
  • aggregate reduces the resolution of a raster by combining values from adjacent pixels.

Extract raster variables using vector layers

Suppose we want to verify if there is a link between elevation and the intensity of defoliation due to the forest tent caterpillar. To do this, we need to learn a new function, extract, which extracts data from a raster object based on specific spatial locations. For example, this code returns the CDEM value at the location of UQAT (292 m).

extract(cdem, uqat)
##     
## 292

What does extract return if we apply it to a polygon? Let’s try with five defoliation polygons located in Rouyn-Noranda:

livree_rn <- st_intersection(livree2017, rn)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
poly_ext <- extract(cdem, livree_rn[1:5, ])

Question: The result in poly_ext is a list of five vectors of different length. What do these values represent?

The result is a list where each vector contains the CDEM values for all pixels within the corresponding polygon; by default, a pixel is included if its center is within the polygon boundaries. To obtain a single value per polygon, we need to specify a summary function as the fun argument. Here, we can take the mean of values within each polygon.

Depending on your computer, this function should take a minute or two to run. Once the mean elevation is computed, we copy the values to a new column in livree_rn.

moy_cdem <- extract(cdem, livree_rn, fun = mean)
livree_rn$cdem <- moy_cdem[, 1]

Finally, let’s compare the distribution of elevation for defoliated areas with different intensity levels.

boxplot(cdem ~ Ia, data = livree_rn)

Review

  • extract pulls values from a raster object based on spatial coordinates.
  • For a set of points, extract returns the raster value at each point.
  • For a set of polygons, extract returns the values for all pixels located within each polygon. These values can be combined into one value per polygon by specifying a summary function.

Interactive maps with mapview

The mapview package provides visualizations of sf and raster layers on an interactive map (similar to Google Maps) with different basemap options (e.g. OpenStreetMap, OpenTopoMap, ESRI World Imagery). It is as simple as calling the mapview function with the name of the spatial dataset. Additional data can be overlaid with the + operator.

library(mapview)

mapview(cdem) + livree_rn[, "Niveau"]
## Warning in rasterCheckSize(x, maxpixels = maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ; 
## the supplied Raster* has 26087052 
##  ... decreasing Raster* resolution to 5e+05 pixels
##  to view full resolution set 'maxpixels =  26087052 '

Additional references


Solutions

Exercise 1

Produce a map with MRCs colored by administrative region, e.g.: Nord-du-Québec, Côte-Nord, etc.

plot(mrc[, "MRS_NM_REG"], key.size = lcm(5))

Return to text

Exercise 2

The st_intersection(A, B) function returns the intersection of two vector datasets A and B, that is, the areas where objects in each dataset overlap. Using this function, determine how many MRCs are located within 100 km of the point uqat, and try to show on a map representing this circular area.

inters <- st_intersection(mrc_proj, rayon)
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
plot(inters[, "MRS_NM_MRC"], key.size = lcm(5))

Return to text

Exercise 3

  • Determine the total area (in hectares) defoliated by the forest tent caterpillar in the MRC of Rouyn-Noranda.
  • Produce a map of the defoliated areas with their intensity level for that MRC.
rn <- mrc[mrc$MRS_NM_MRC == "Rouyn-Noranda", ]

# Verify that the CRS are identical
st_crs(rn) == st_crs(livree2017)
## [1] TRUE

# Intersect the defoliation polygons with the MRC boundaries
livree_rn <- st_intersection(livree2017, rn)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries

# Sum the areas of the polygons
sum(livree_rn$SupHaCea)
## [1] 52793.94


plot(livree_rn[, "Ia"], reset = FALSE)
plot(rn[, "geometry"], add = TRUE, reset = FALSE)
plot(uqat, add = TRUE, pch = 20, col = "red")

Return to text

Exercise 4

From cdem, produce (a) a map where elevation is represented in km rather than meters and (b) a map indicating areas of elevation under 300 m.

plot(cdem / 1000)

plot(cdem < 300)

Return to text